Goto

Collaborating Authors

 markovian model


Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences

arXiv.org Machine Learning

Residential electricity demand at granular scales is driven by what people do and for how long. Accurately forecasting this demand for applications like microgrid management and demand response therefore requires generative models that can produce realistic daily activity sequences, capturing both the timing and duration of human behavior. This paper develops a generative model of human activity sequences using nationally representative time-use diaries at a 10-minute resolution. We use this model to quantify which demographic factors are most critical for improving predictive performance. We propose a hierarchical semi-Markov framework that addresses two key modeling challenges. First, a time-inhomogeneous Markov \emph{router} learns the patterns of ``which activity comes next." Second, a semi-Markov \emph{hazard} component explicitly models activity durations, capturing ``how long" activities realistically last. To ensure statistical stability when data are sparse, the model pools information across related demographic groups and time blocks. The entire framework is trained and evaluated using survey design weights to ensure our findings are representative of the U.S. population. On a held-out test set, we demonstrate that explicitly modeling durations with the hazard component provides a substantial and statistically significant improvement over purely Markovian models. Furthermore, our analysis reveals a clear hierarchy of demographic factors: Sex, Day-Type, and Household Size provide the largest predictive gains, while Region and Season, though important for energy calculations, contribute little to predicting the activity sequence itself. The result is an interpretable and robust generator of synthetic activity traces, providing a high-fidelity foundation for downstream energy systems modeling.


Quantum open system identification via global optimization: Optimally accurate Markovian models of open systems from time-series data

arXiv.org Artificial Intelligence

Accurate models of the dynamics of quantum circuits are essential for optimizing and advancing quantum devices. Since first-principles models of environmental noise and dissipation in real quantum systems are often unavailable, deriving accurate models from measured time-series data is critical. However, identifying open quantum systems poses significant challenges: powerful methods from systems engineering can perform poorly beyond weak damping (as we show) because they fail to incorporate essential constraints required for quantum evolution (e.g., positivity). Common methods that can include these constraints are typically multi-step, fitting linear models to physically grounded master equations, often resulting in non-convex functions in which local optimization algorithms get stuck in local extrema (as we show). In this work, we solve these problems by formulating quantum system identification directly from data as a polynomial optimization problem, enabling the use of recently developed global optimization methods. These methods are essentially guaranteed to reach global optima, allowing us for the first time to efficiently obtain the most accurate Markovian model for a given system. In addition to its practical importance, this allows us to take the error of these Markovian models as an alternative (operational) measure of the non-Markovianity of a system. We test our method with the spin-boson model -- a two-level system coupled to a bath of harmonic oscillators -- for which we obtain the exact evolution using matrix-product-state techniques. We show that polynomial optimization using moment/sum-of-squares approaches significantly outperforms traditional optimization algorithms, and we show that even for strong damping Lindblad-form master equations can provide accurate models of the spin-boson system.


Do Transformer World Models Give Better Policy Gradients?

arXiv.org Artificial Intelligence

A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long horizons: could they be the solution to this problem? Surprisingly, we show that commonly-used transformer world models produce circuitous gradient paths, which can be detrimental to long-range policy gradients. To tackle this challenge, we propose a class of world models called Actions World Models (AWMs), designed to provide more direct routes for gradient propagation. We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent. We demonstrate that AWMs can generate optimization landscapes that are easier to navigate even when compared to those from the simulator itself. This property allows transformer AWMs to produce better policies than competitive baselines in realistic long-horizon tasks.


A Two-Scale Complexity Measure for Deep Learning Models

arXiv.org Artificial Intelligence

We introduce a novel capacity measure 2sED for statistical models based on the effective dimension. The new quantity provably bounds the generalization error under mild assumptions on the model. Furthermore, simulations on standard data sets and popular model architectures show that 2sED correlates well with the training error. For Markovian models, we show how to efficiently approximate 2sED from below through a layerwise iterative approach, which allows us to tackle deep learning models with a large number of parameters. Simulation results suggest that the approximation is good for different prominent models and data sets.


Diffusion of Credit in Markovian Models

Neural Information Processing Systems

This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.


Direct and Indirect Effects

arXiv.org Artificial Intelligence

The direct effect of one event on another can be defined and measured by holding constant all intermediate variables between the two. Indirect effects present conceptual and practical difficulties (in nonlinear models), because they cannot be isolated by holding certain variables constant. This paper presents a new way of defining the effect transmitted through a restricted set of paths, without controlling variables on the remaining paths. This permits the assessment of a more natural type of direct and indirect effects, one that is applicable in both linear and nonlinear models and that has broader policy-related interpretations. The paper establishes conditions under which such assessments can be estimated consistently from experimental and nonexperimental data, and thus extends path-analytic techniques to nonlinear and nonparametric models.


An Event-Based Framework for Process Inference

AAAI Conferences

We focus on a class of models used for representing the dynamics between a discrete set of probabilistic events in a continuous-time setting. The proposed framework offers tractable learning and inference procedures and provides compact state representations for processes which exhibit variable delays between events. The approach is applied to a heart sound labeling task that exhibits long-range dependencies on previous events, and in which explicit modeling of the rhythm timings is justifiable by cardiological principles.


Diffusion of Context and Credit Information in Markovian Models

Journal of Artificial Intelligence Research

This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.